Skip to content

Guard runtime SET UTF-8 suffix parsing#22301

Open
Sean-Kenneth-Doherty wants to merge 1 commit into
apache:mainfrom
Sean-Kenneth-Doherty:codex/runtime-set-nonascii
Open

Guard runtime SET UTF-8 suffix parsing#22301
Sean-Kenneth-Doherty wants to merge 1 commit into
apache:mainfrom
Sean-Kenneth-Doherty:codex/runtime-set-nonascii

Conversation

@Sean-Kenneth-Doherty
Copy link
Copy Markdown

@Sean-Kenneth-Doherty Sean-Kenneth-Doherty commented May 17, 2026

Which issue does this PR close?

Rationale for this change

Runtime SET datafusion.runtime.* parsing split the last byte off capacity and duration values with split_at(value.len() - 1). For values ending in a multi-byte UTF-8 character, that byte index can fall inside the final character and panic before DataFusion can return a planner error.

What changes are included in this PR?

  • Adds a small UTF-8-aware helper that splits a config value at the start of its final character.
  • Reuses that helper for deprecated memory limit parsing, current capacity limit parsing, and duration segment parsing.
  • Adds unit coverage for non-ASCII invalid suffixes.
  • Adds SQL runtime coverage for all reported reachable runtime keys.

Are these changes tested?

Yes.

  • cargo test -p datafusion --lib test_parse_
  • cargo test -p datafusion --test core_integration test_runtime_config_non_ascii_suffix_returns_error
  • cargo fmt --check
  • cargo clippy -p datafusion --lib --test core_integration -- -D warnings
  • git diff --check

Are there any user-facing changes?

Yes. Malformed runtime config values ending in non-ASCII characters now return planner errors instead of panicking.

Scope note

The change is limited to avoiding byte-boundary panics while preserving the existing parser/error paths for malformed runtime config values.

@github-actions github-actions Bot added the core Core DataFusion crate label May 17, 2026
@Sean-Kenneth-Doherty
Copy link
Copy Markdown
Author

Validated head df3afaa locally. The patch targets the old byte-index slicing path, and the regression now returns normal planning errors for non-ASCII suffixes instead of panicking.

Commands run:

  • cargo fmt --check --all
  • git diff --check origin/main...HEAD
  • git diff --check
  • cargo test -p datafusion --lib test_parse_ (4 passed)
  • cargo test -p datafusion --test core_integration test_runtime_config_non_ascii_suffix_returns_error (1 passed)
  • cargo test -p datafusion --test core_integration runtime_config (12 passed)

Working tree remained clean after validation. Hosted checks currently show the labeler Process check passing; no broader CI jobs have been scheduled on the PR yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

panic on SET datafusion.runtime.* when value ends with non-ASCII byte (split_at char boundary)

1 participant